Tree-Structured Trajectory Encoding for Vision-and-Language Navigation
نویسندگان
چکیده
Over the past few years, research on vision-and-language navigation (VLN) has made tremendous progress. Many previous works attempted to improve performance from different aspects like training strategy, data augmentation, pre-training, etc. This work focuses a rarely-explored aspect in VLN, namely trajectory organization and encoding during navigation. Most of existing state-of-the-art VLN models adopt vanilla sequential strategy for trajectories. Such takes whole as single sequence estimate current state, no matter whether agent moved smoothly or perhaps mistakes backtracked past. We show that may largely lose this kind fine-grained structure trajectory, which could hamper later state estimation decision making. In order solve problem, proposes novel tree-structured strategy. The is organized tree rooted starting position, encoded using our Tree-Transformer module fully extract historical information. Besides, spatial topology be easily embedded tree, we further design tree-based action space allow making long-range error-correction one decision. implement holistic based cross-modal transformer train it with newly-proposed Tree-nDTW reward. On benchmark dataset R2R, model achieves surpassing success rate (SR) 68% val-unseen 66% test. conduct extensive ablation studies analyses provide more insights effectiveness designs.
منابع مشابه
Receptive Field Encoding Model for Dynamic Natural Vision
Introduction: Encoding models are used to predict human brain activity in response to sensory stimuli. The purpose of these models is to explain how sensory information represent in the brain. Convolutional neural networks trained by images are capable of encoding magnetic resonance imaging data of humans viewing natural images. Considering the hemodynamic response function, these networks are ...
متن کاملFast tree-structured nearest neighbor encoding for vector quantization
This work examines the nearest neighbor encoding problem with an unstructured codebook of arbitrary size and vector dimension. We propose a new tree-structured nearest neighbor encoding method that significantly reduces the complexity of the full-search method without any performance degradation in terms of distortion. Our method consists of efficient algorithms for constructing a binary tree f...
متن کاملVision-based line tracking and navigation in structured environments
T h i s p a p e r descrzbes a uiszon-based, l o w cost lznet rackzng s y s t e m sui table f o r robot o r AGV iiavrgatzon zii s t r u c t u r e d erruironmeiits Veh ic l e naoigat to i i takes a d z a n l u g e o f t h e 1 i ~ 1 ~ u l znformntion prov idpd by art i f i c ia l o r pre -e z i s t zny landnzarks, speci f ical ly lznes a n d s i p s This ziiformatzon zs e f t i c i e n t l y proce...
متن کاملA Structured Language Model for Incremental Tree-to-String Translation
Tree-to-string systems have gained significant popularity thanks to their simplicity and efficiency by exploring the source syntax information, but they lack in the target syntax to guarantee the grammaticality of the output. Instead of using complex tree-to-tree models, we integrate a structured language model, a left-to-right shift-reduce parser in specific, into an incremental tree-to-string...
متن کاملTowards Private Navigation of Tree Structured Spatial Indexes
With many location-based services, spatial data such as points of interest are indexed at a potentially untrusted host and queries are evaluated by navigating the underlying index structure used to partition the data. While encryption can prevent the host from learning the data content (i.e., what is accessed), it cannot hide the frequency that index nodes are accessed while navigating the inde...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i3.25494